Disease-Specific Risk Prediction through Stability Selection using Electronic Health Records

نویسندگان

  • Jiayu Zhou
  • Jimeng Sun
  • Yashu Liu
  • Jianying Hu
  • Jieping Ye
چکیده

Disease-specific risk prediction aims at assessing the risk of a patient in developing a target disease based on his/her health profile. As electronic health records (EHRs) become more prevalent, a large number of features can be constructed in order to characterize patient profiles. This wealth of data provides unprecedented opportunities for data mining researchers to address important biomedical questions. Practical data mining challenges include: How to correctly select and rank those features based on their prediction power? What predictive model performs the best in predicting a target disease using those features? In this paper, we propose top-k stability selection, which generalizes a powerful sparse learning method for feature selection by overcoming its limitation on parameter selection. In particular, our proposed top-k stability selection includes the original stability selection method as a special case given k = 1. Moreover, we show that the top-k stability selection is more robust by utilizing more information from selection probabilities than the original stability selection, and provides stronger theoretical properties. In a large set of real clinical prediction datasets, the top-k stability selection methods outperform many existing feature selection methods including the original stability selection. We also compare three competitive classification methods (SVM, logistic regression and random forest) to demonstrate the effectiveness of selected features by our proposed method in the context of clinical prediction applications. Finally, through several clinical applications on predicting heart failure related symptoms, we show that top-k stability selection can successfully identify important features that are clinically meaningful.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Patient Risk Prediction Model via Top-k Stability Selection

The patient risk prediction model aims at assessing the risk of a patient in developing a target disease based on his/her health profile. As electronic health records (EHRs) become more prevalent, a large number of features can be constructed in order to characterize patient profiles. This wealth of data provides unprecedented opportunities for data mining researchers to address important biome...

متن کامل

Utilizing electronic health records to predict acute kidney injury risk and outcomes: workgroup statements from the 15th ADQI Consensus Conference

The data contained within the electronic health record (EHR) is "big" from the standpoint of volume, velocity, and variety. These circumstances and the pervasive trend towards EHR adoption have sparked interest in applying big data predictive analytic techniques to EHR data. Acute kidney injury (AKI) is a condition well suited to prediction and risk forecasting; not only does the consensus defi...

متن کامل

Using electronic health record collected clinical variables to predict medical intensive care unit mortality

BACKGROUND Clinical decision support systems are used to help predict patient stability and mortality in the Intensive Care Unit (ICU). Accurate patient information can assist clinicians with patient management and in allocating finite resources. However, systems currently in common use have limited predictive value in the clinical setting. The increasing availability of Electronic Health Recor...

متن کامل

Disease Prediction Based on Prior Knowledge

Increasing demand for digitalization of Electronic Health Records results in increased demand for effective data mining solutions. In this study we enhance the classical Support Vector Machine Recursive Feature Elimination (SVM-RFE) approach to optimally estimate disease risk from hospital discharge record data. Our approach is based on incorporating prior knowledge from human disease networks ...

متن کامل

Using big data to improve cardiovascular care and outcomes in China: a protocol for the CHinese Electronic health Records Research in Yinzhou (CHERRY) Study

INTRODUCTION Data based on electronic health records (EHRs) are rich with individual-level longitudinal measurement information and are becoming an increasingly common data source for clinical risk prediction worldwide. However, few EHR-based cohort studies are available in China. Harnessing EHRs for research requires a full understanding of data linkages, management, and data quality in large ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012